Text Entailment for Logical Segmentation and Summarization
نویسندگان
چکیده
Summarization is the process of condensing a source text into a shorter version preserving its information content ([2]). This paper presents some original methods for text summarization by extraction of a single source document based on a particular intuition which is not explored till now: the logical structure of a text. The summarization relies on an original linear segmentation algorithm which we denote logical segmentation (LTT) because the score of a sentence is the number of sentences of the text which are entailed by it. The summary is obtained by three methods: selecting the first sentence(s) from a segment, selecting the best scored sentence(s) from a segment and selecting the most informative sentence(s) (relative to the previously selected) from a segment. Moreover, our methods permit dynamically adjusting the derived summary size, independently of the number of segments. Alternatively, a Dynamic Programming (DP) method, based on the continuity principle and applied to the sentences logically scored as above is proposed. This method proceeds by obtaining the summary firstly and then determining the segments. Our methods of segmentation are applied and evaluated against the segmentation of the text “I spent the first 19 years” of Morris and Hirst ([17]). The original text is reproduced at [26]. Some statistics about the informativeness of the summaries with different lengths and obtained with the above methods relatively to the original (summarized) text are given. These statistics prove that the segmentation preceding the summarization could improve the quality of obtained summaries.
منابع مشابه
Top-Down Cohesion Segmentation in Summarization
The paper proposes a new method of linear text segmentation based on lexical cohesion of a text. Namely, first a single chain of disambiguated words in a text is established, then the rips of this single chain are considered as boundaries for the segments of the cohesion text structure (Cohesion TextTiling or CTT). The summaries of arbitrarily length are obtained by extraction using three diffe...
متن کاملRecognizing Textual Entailment Using Description Logic and Semantic Relatedness
Recognizing Textual Entailment using Description Logic and Semantic Relatedness Reda Siblini, Ph.D. Concordia University, 2014 Textual entailment (TE) is a relation that holds between two pieces of text where one reading the first piece can conclude that the second is most likely true. Accurate approaches for textual entailment can be beneficial to various natural language processing (NLP) appl...
متن کاملText Summarization through Entailment-based Minimum Vertex Cover
Sentence Connectivity is a textual characteristic that may be incorporated intelligently for the selection of sentences of a well meaning summary. However, the existing summarization methods do not utilize its potential fully. The present paper introduces a novel method for singledocument text summarization. It poses the text summarization task as an optimization problem, and attempts to solve ...
متن کاملA Text Summarization Approach under the Influence of Textual Entailment
This paper presents how text summarization can be influenced by textual entailment. We show that if we use textual entailment recognition together with text summarization approach, we achieve good results for final summaries, obtaining an improvement of 6.78% with respect to the summarization approach only. We also compare the performance of this combined approach to two baselines (the one prov...
متن کاملAn Effective Sentence Ordering Approach For Multi-Document Summarization Using Text Entailment
With the rapid development of modern technology electronically available textual information has increased to a considerable amount. Summarization of textual information manually from unstructured text sources creates overhead to the user, therefore a systematic approach is required. Summarization is an approach that focuses on providing the user with a condensed version of the original text bu...
متن کامل